Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 6 of 6

1	The effect of domain and diacritics in Yorùbá-English neural machine translation
	Adelani, David,; Ruiter, Dana; Alabi, Jesujoba,; Adebonojo, Damilola; Ayeni, Adesina; Adeyemi, Mofetoluwa; Awokoya, Ayodele; Espana-Bonet, Cristina
	In: 18th Biennial Machine Translation Summit ; https://hal.inria.fr/hal-03350967 ; 18th Biennial Machine Translation Summit, Aug 2021, Orlando, United States (2021)
	Abstract: International audience ; Massively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due to lack of standardized evaluation datasets. In this paper, we present MENYO-20k, the first multi-domain parallel corpus with a special focus on clean orthography for Yorùbá-English with standardized train-test splits for benchmarking. We provide several neural MT benchmarks and compare them to the performance of popular pre-trained (massively multilingual) MT models both for the heterogeneous test set and its subdomains. Since these pre-trained models use huge amounts of data with uncertain quality, we also analyze the effect of diacritics, a major characteristic of Yorùbá, in the training data. We investigate how and when this training condition affects the final quality and intelligibility of a translation. Our models outperform massively multilingual models such as Google (+8.7 BLEU) and Facebook M2M (+9.1 BLEU) when translating to Yorùbá, setting a high quality benchmark for future research.
	Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
	URL: https://hal.inria.fr/hal-03350967 https://hal.inria.fr/hal-03350967/document https://hal.inria.fr/hal-03350967/file/adelani_MTSummit2021.pdf
	BASE
	Hide details

2	The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation ...
	Adelani, David I.; Ruiter, Dana; Alabi, Jesujoba O.. - : arXiv, 2021
	BASE
	Show details

3	Emoji-Based Transfer Learning for Sentiment Tasks ...
	Boy, Susann; Ruiter, Dana; Klakow, Dietrich. - : arXiv, 2021
	BASE
	Show details

4	EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT ...
	Tchistiakova, Svetlana; Alabi, Jesujoba; Chowdhury, Koel Dutta. - : arXiv, 2021
	BASE
	Show details

5	Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages ...
	Ruiter, Dana; Klakow, Dietrich; van Genabith, Josef. - : arXiv, 2021
	BASE
	Show details

6	Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Hahn, Vanessa; Klakow, Dietrich. - : Underline Science Inc., 2021
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern